A New Inductive Learning Method for Multilabel Text Categorization

نویسندگان

Yu-Chuan Chang

Shyi-Ming Chen

Churn-Jung Liau

چکیده

In this paper, we present a new inductive learning method for multilabel text categorization. The proposed method uses a mutual information measure to select terms and constructs document descriptor vectors for each category based on these terms. These document descriptor vectors form a document descriptor matrix. It also uses the document descriptor vectors to construct a document-similarity matrix based on the "cosine similarity measure". It then constructs a term-document relevance matrix by applying the inner product of the document descriptor matrix to the document similarity matrix. The proposed method infers the degree of relevance of the selected terms to construct the category descriptor vector of each category. Then, the relevance score between each category and a testing document is calculated by applying the inner product of its category descriptor vector to the document descriptor vector of the testing document. The maximum relevance score L is then chosen. If the relevance score between a category and the testing document divided by L is not less than a predefined threshold value λ between zero and one, then the document is classified into that category. We also compare the classification accuracy of the proposed method with that of the existing learning methods (i.e., Find Similar, Naïve Bayes, Bayes Nets and Decision Trees) in terms of the break-even point of micro-averaging for categorizing the "Reuters-21578 Aptè split" data set. The proposed method gets a higher average accuracy than the existing methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Multi-topic Text Categorization Based on Ranking Approach

This paper is devoted to the multi-topic (multilabel) text classification problem. We propose two methods for reduction from ranking to the multi-label case. Unlike existing multi-label classification methods based on reduction from ranking problem, where the complex classification (threshold) function is being defined on the input feature space, in our approach we propose the construction of s...

متن کامل

A Unified Model for Multilabel Classification and Ranking

Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a suitable extension of label ranking that incorporates the calibrated scenario and substantially e...

متن کامل

Efficient Text Categorization Using a Min-Max Modular Support Vector Machine

The min-max modular support vector machine (M-SVM) has been proposed for solving large-scale and complex multiclass classification problems. In this paper, we apply the M-SVM to multilabel text categorization and introduce two task decomposition strategies into M-SVMs. A multilabel classification task can be split up into a set of two-class classification tasks. These two-class tasks are to dis...

متن کامل

Mining Methods in a Digital Library

The article deals with use of Itemsets classifier based on inductive machine learning in the context of digital library environment. We provide a brief description of a real-world digital library implemented at a power utility. Its implementation and operating experience have motivated our research in inductive machine learning methods for text mining described in the paper. Being inspired by m...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

A New Inductive Learning Method for Multilabel Text Categorization

نویسندگان

چکیده

منابع مشابه

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

Multi-topic Text Categorization Based on Ranking Approach

A Unified Model for Multilabel Classification and Ranking

Efficient Text Categorization Using a Min-Max Modular Support Vector Machine

Mining Methods in a Digital Library

عنوان ژورنال:

اشتراک گذاری